Skip to content

ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568

Open
joeykleingers wants to merge 1 commit into
BlueQuartzSoftware:developfrom
joeykleingers:worktree-ooc-architecture-rewrite
Open

ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568
joeykleingers wants to merge 1 commit into
BlueQuartzSoftware:developfrom
joeykleingers:worktree-ooc-architecture-rewrite

Conversation

@joeykleingers
Copy link
Copy Markdown
Contributor

@joeykleingers joeykleingers commented Mar 24, 2026

Summary

Rewrites the out-of-core (OOC) architecture in simplnx, replacing the old chunk-based API with a new bulk I/O design built around copyIntoBuffer/copyFromBuffer on AbstractDataStore. Introduces the core infrastructure that the OOC-optimized filter algorithms (separate PR #1575) build upon.

Core Architecture Changes

  • Removed old chunk API from AbstractDataStore / IDataStore (loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds, getChunkShape)
  • Added copyIntoBuffer / copyFromBuffer pure virtual bulk I/O methods to AbstractDataStore with implementations in DataStore, EmptyDataStore, and HDF5ChunkedStore (in SimplnxOoc plugin)
  • Added StoreType enum (InMemory, OutOfCore, Empty) to IDataStore; IsOutOfCore() now checks StoreType instead of getChunkShape()
  • HDF5ChunkedStore performs I/O via HDF5 hyperslab selections with Z-slice-aligned default chunk shape {1,Y,X} for 3D data
  • copyFromBuffer fast path: skips read-modify-write for tuple-aligned writes
  • copyIntoBuffer fast path: direct span-based readTuples for tuple-aligned reads
  • HDF5 DatasetIO gains readTuples/writeTuples for direct hyperslab-based bulk tuple I/O

New Core Utilities

  • DispatchAlgorithm — Runtime dispatch between in-core (Direct) and OOC (Scanline/CCL) algorithm variants based on data store type
  • SliceBufferedTransfer — Type-dispatched Z-slice buffered tuple copy utility that eliminates per-element OOC overhead during morphological transfer phases
  • UnionFind — Vector-based disjoint set data structure with union-by-rank and path-halving compression for chunk-sequential CCL algorithms
  • SegmentFeatures OOC path — Z-slice CCL-based connected component labeling with UnionFind equivalence tracking, replacing BFS/DFS flood fill for OOC data
  • AlignSections OOC path — Bulk slice read/write with AlignSectionsTransferDataOocImpl
  • DataArrayUtilities bulk I/OImportFromBinaryFile, AppendData, CopyData, and mirror swap_ranges updated with chunked bulk I/O (runtime OOC check preserves original in-core performance)

OOC Store Management

  • DataIOCollection / IDataIOManager — Updated for OOC store lifecycle management
  • ImportH5ObjectPathsAction — OOC-aware file import with recovery metadata
  • DataStoreIO — Detect OOC recovery attributes in ReadDataStore for safe data restoration
  • Legacy .dream3d support — Handle legacy file formats in OOC backfill operations

Test Infrastructure

  • CompareDataArrays rewritten to use copyIntoBuffer in 40K-element chunks instead of per-element operator[]
  • ForceOocAlgorithmGuard for dual-path test coverage
  • SIMPLNX_TEST_ALGORITHM_PATH CMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path control
  • Programmatic test data builders with Z-slice batched bulk writes for OOC efficiency

Related PRs

Test Plan

  • Tests pass on in-core build
  • Tests pass on out-of-core build
  • In-core performance verified: no regression on utility changes (CopyData, AppendData, mirror swaps)

@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from b4ef97f to 99b49ed Compare March 24, 2026 18:13
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 2 times, most recently from b4ef97f to bb09048 Compare March 24, 2026 18:51
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 102c436 to b4c1358 Compare April 2, 2026 00:55
@joeykleingers joeykleingers changed the title WIP: OOC architecture rewrite — new bulk I/O API, SimplnxOoc plugin, and filter optimizations ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 2, 2026
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 6 times, most recently from 2bd614a to 110c054 Compare April 8, 2026 17:41
@joeykleingers joeykleingers changed the title ENH: OOC architecture rewrite — new bulk I/O API and infrastructure WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 8, 2026
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 35aecd0 to 3a88bbf Compare April 16, 2026 13:03
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from bdfed87 to 6fbfc8d Compare April 27, 2026 18:18
@joeykleingers joeykleingers marked this pull request as ready for review April 28, 2026 00:08
@joeykleingers joeykleingers changed the title WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 28, 2026
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from 2095f32 to 65c4ba8 Compare May 5, 2026 13:30
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from 65c4ba8 to 653cf62 Compare May 12, 2026 15:03
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 2 times, most recently from 1773602 to 4834fa1 Compare June 1, 2026 14:06
Introduce a compile-time out-of-core storage capability for simplnx: a
DataStore abstraction that can back arrays on disk, an IO/recovery path
that registers and finalizes those stores, a memory-budget manager that
decides in-core vs OOC placement, and chunk-aware parallel dispatch so
algorithms stream chunked data efficiently. OOC is gated behind the
compile-time SIMPLNX_USE_OOC switch and is wholly absent from in-core
builds. Every OOC entry point is a direct SimplnxOoc:: function call
guarded by #ifdef SIMPLNX_USE_OOC and compiled into libsimplnx -- there
is no runtime hook, function pointer, or plugin indirection.
93 files changed, +5501 / -1229 lines.

================================================================================
1. Compile-time OOC switch and SimplnxOoc integration
================================================================================
Files: CMakeLists.txt, cmake/SimplnxConfig.hpp.in

Add the SIMPLNX_USE_OOC option and a generated SimplnxConfig.hpp that
carries the macro through simplnx's PUBLIC include path, so the value is
identical (ODR-safe) across every consumer and MOC. When enabled, the
private SimplnxOoc sources are compiled in place into libsimplnx -- no
copy into the build tree, no separate library, no simplnx->SimplnxOoc
link cycle -- and a source-dir guard fails configuration if the path is
missing. simplnx core calls SimplnxOoc:: functions directly, all behind
#ifdef SIMPLNX_USE_OOC.

================================================================================
2. DataStore layer: in-core / out-of-core abstraction
================================================================================
Files: src/simplnx/DataStructure/{AbstractDataStore,DataStore,EmptyDataStore}.hpp,
       src/simplnx/DataStructure/EmptyStringStore.hpp,
       src/simplnx/Utilities/{DataStoreUtilities,ArrayCreationUtilities}.hpp

Generalize the data-store interface so an array can be backed in core or
out of core. Array creation resolves its storage format with a direct,
compile-time-gated SimplnxOoc::resolveFormat call and builds a chunked
store via SimplnxOoc::createChunkedStore / createChunkedListStore when the
OOC format is requested; in-core builds compile none of these calls and
stay fully in core. Add an empty string store and extend the empty data
store for the deferred-load case.

================================================================================
3. IO collection, format registration, and recovery write
================================================================================
Files: src/simplnx/DataStructure/IO/Generic/DataIOCollection.{hpp,cpp},
       src/simplnx/DataStructure/IO/HDF5/{DatasetIO,NeighborListIO,DataStructureWriter}.*

The DataIOCollection constructor calls SimplnxOoc::registerIOManager to
add the OOC IO manager, and finalizes stores on write via
SimplnxOoc::finalizeStores -- replacing the former plugin's load-time
setup with direct calls. The HDF5 writer calls
SimplnxOoc::maybeWriteRecoveryArray directly so dirty OOC arrays are
flushed during WriteFile.

================================================================================
4. DREAM3D file loading API and deferred load
================================================================================
Files: src/simplnx/Utilities/Parsing/DREAM3D/Dream3dIO.{hpp,cpp},
       test/Dream3dLoadingApiTest.cpp, test/IOFormat.cpp

Rework the .dream3d loader API. In OOC builds, import is deferred via a
direct SimplnxOoc::handleImport call (lazy load) and writes run under a
SimplnxOoc::RecoveryWriteGuard; in-core builds load eagerly. Migrate the
compression test cases to the new loader API and add coverage for the
loading paths.

================================================================================
5. Memory budget management and the nxrunner --memory-budget flag
================================================================================
Files: src/simplnx/Utilities/MemoryBudgetManager.{hpp,cpp},
       src/simplnx/Core/Preferences.{hpp,cpp}, src/nxrunner/src,
       test/MemoryBudgetManagerTest.cpp

Add a MemoryBudgetManager that tracks the budget governing in-core vs OOC
placement. Preferences seeds the OOC base directory (via a direct
SimplnxOoc::setBaseDirectory call) and default format on startup and gains
a removeValue helper. nxrunner exposes a --memory-budget flag wired through
to the budget manager, with parsing hardened against NaN, inf, and
trailing garbage.

================================================================================
6. Chunk-aware parallel dispatch and the Extent utility
================================================================================
Files: src/simplnx/Utilities/AlgorithmDispatch.hpp, src/simplnx/Common/Extent.hpp,
       test/{IParallelAlgorithmTest,ExtentTest}.cpp

Extend algorithm dispatch with direct vs scanline paths so chunked OOC
arrays are traversed in chunk-sequential order rather than thrashing the
cache. Add an Extent helper for region math. Both gain unit tests.

================================================================================
7. Filter, action, and test updates for OOC
================================================================================
Files: src/Plugins/SimplnxCore/.../Filters (+Algorithms, incl. FillBadData),
       src/simplnx/Filter/Actions, plugin tests, test/UnitTestCommon

Propagate the chosen store format through the data-creation actions and
update the affected SimplnxCore filters/algorithms (notably FillBadData,
reworked to stream Z-slabs via bulk I/O) and OrientationAnalysis/
ITKImageProcessing call sites to the new APIs. Extend UnitTestCommon and
plugin tests accordingly.

Verified: the full OOC-Release build (libsimplnx with the SimplnxOoc
sources compiled in place, all plugins, and all test executables) compiles
and links cleanly; InCore-Release (SIMPLNX_USE_OOC=OFF) builds and the
DREAM3D-NX app launches. The ctest suite was not run.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
@joeykleingers joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from 4834fa1 to d4f634b Compare June 1, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant